Semantic Document Selection - Historical Research on Collections That Span Multiple Centuries

نویسندگان

  • Daan Odijk
  • Ork de Rooij
  • Maria-Hendrike Peetz
  • Toine Pieters
  • Maarten de Rijke
  • Stephen Snelders
چکیده

The availability of digitized collections of historical data, such as newspapers, increases every day. With that, so does the wish for historians to explore these collections. Methods that are traditionally used to examine a collection do not scale up to today’s collection sizes. We propose a method that combines text mining with exploratory search to provide historians with a means of interactively selecting and inspecting relevant documents from very large collections. We assess our proposal with a case study on a prototype system.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Semantic Document Selection

The availability of digitized collections of historical data, such as newspapers, increases every day. With that, so does the wish for historians to explore these collections. Methods that are traditionally used to examine a collection do not scale up to today’s collection sizes. We propose a method that combines text mining with exploratory search to provide historians with a means of interact...

متن کامل

A Joint Semantic Vector Representation Model for Text Clustering and Classification

Text clustering and classification are two main tasks of text mining. Feature selection plays the key role in the quality of the clustering and classification results. Although word-based features such as term frequency-inverse document frequency (TF-IDF) vectors have been widely used in different applications, their shortcoming in capturing semantic concepts of text motivated researches to use...

متن کامل

Finding Centuries-Old Hyperlinks: a Novel Semi-Supervised Shape Classifier

Hyperlinks are so useful for searching and browsing modern digital collections that researchers have longer wondered if it is possible to retroactively add hyperlinks to digitized historical documents. There has already been significant research into this endeavor for historical text; however, in this work we consider the problem of adding hyperlinks among graphic elements. While such a system ...

متن کامل

SHAX: The Semantic Historical Archive eXplorer

Newspaper archives are some of the richest historical document collections. Their study is, however, very tedious: one needs to physically visit the archives, search through reams of old, very fragile paper, and manually assemble cross-references. We present Shax, a visual newspaper-archive exploration tool that takes large, historical archives as an input and allows interested parties to brows...

متن کامل

Conceptual changes of Mihrab, emphasizing on third and fourth century AH sources

Mihrab existed before Islam. This word became one of the main components of Islamic mosques after the Islamic conquest. Structure and function changes of Mihrab during these two periods could be considered in various methods. Considering conceptual changes is one of the historical studies methods. This article aims to investigate a part of conceptual changes that reflects Mihrab’s structural an...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2012